Data Reading and Processing: In this step,the dataset provided was read.Timestamp column were arranged as the index.To fill the NaN values fillna operation of pandas were implemented.Methods of backfill and forwardfill were used in order not to leave any empty cells.MinMax scaling were implemented in order to scale the data to same interval for every respective column.First five values of the original dataset are provided.

In [229]:
#libraries used and data reading
import pandas as pd
import numpy as np
import datetime
import time as tm
import pytz
import matplotlib.pyplot as plt
import seaborn as sns
from statsmodels.tsa.stattools import adfuller
from sklearn.preprocessing import MinMaxScaler
sns.set()
#Reading the data and filling nan values
df_read=pd.read_csv("C:/Users/kteke/OneDrive/Desktop/all_ticks_wide.csv")
df_original=df_read
df_read.timestamp = pd.to_datetime(df_read.timestamp)
df_read.set_index('timestamp', inplace=True)
df_read=df_read.fillna(method='bfill')
df_read=df_read.fillna(method='ffill')
scaler=MinMaxScaler()
scaler.fit(df_read.values)
scaled_data=scaler.transform(df_read.values)
df=df_read
df.iloc[:,:]=scaled_data
df_original.head()
Out[229]:
AEFES AKBNK AKSA AKSEN ALARK ALBRK ANACM ARCLK ASELS ASUZU ... TTKOM TUKAS TUPRS USAK VAKBN VESTL YATAS YKBNK YUNSA ZOREN
timestamp
2012-09-17 06:45:00+00:00 22.3978 5.2084 1.7102 3.87 1.4683 1.1356 1.0634 6.9909 2.9948 2.4998 ... 4.2639 0.96 29.8072 1.0382 3.8620 1.90 0.4172 2.5438 2.2619 0.7789
2012-09-17 07:00:00+00:00 22.3978 5.1938 1.7066 3.86 1.4574 1.1275 1.0634 6.9259 2.9948 2.5100 ... 4.2521 0.96 29.7393 1.0382 3.8529 1.90 0.4229 2.5266 2.2462 0.7789
2012-09-17 07:15:00+00:00 22.3978 5.2084 1.7102 NaN 1.4610 1.1356 1.0679 6.9909 2.9855 2.4796 ... 4.2521 0.97 29.6716 1.0463 3.8436 1.91 0.4229 2.5266 2.2566 0.7789
2012-09-17 07:30:00+00:00 22.3978 5.1938 1.7102 3.86 1.4537 1.1275 1.0679 6.9584 2.9855 2.4897 ... 4.2521 0.97 29.7393 1.0382 3.8529 1.91 0.4286 2.5324 2.2619 0.7860
2012-09-17 07:45:00+00:00 22.5649 5.2084 1.7102 3.87 1.4574 1.1356 1.0725 6.9909 2.9760 2.4897 ... 4.2521 0.97 29.8072 1.0382 3.8620 1.90 0.4286 2.5324 2.2619 0.7789

5 rows × 60 columns

Data Visualization: Obtained data were plotted with matplotlibs plot() function.As the number of variables are 60 ,to better illustrate the plots 20 plots with 3 different stock data within each were constructed.As it can be clearly seen from the plots,there is trend in nearly all of the variables which will affect how the data analysis will be conducted.

In [230]:
#Visualize data
for i in range(20):
    df.iloc[:,i*3:(i+1)*3].plot(figsize=(20,10), linewidth=2, fontsize=20)
    plt.xlabel('date', fontsize=20);

Density Plot for the Dataset: For every variable a density plot were constructed.These plots were constructed only for visualization purposes.As there is trend within data ,meaningful results cannot be obtained from them.

In [231]:
for j in range(0,6):
    fig, axes = plt.subplots(2,5, figsize=(15, 5))
    ax = axes.flatten()
    for i, col in enumerate(df.columns[j*10:(j+1)*10]):
        sns.kdeplot(df[col], ax=ax[i]) 
        ax[i].set_title(col)
    fig.tight_layout(w_pad=6, h_pad=4)
    plt.show()

Density Plot for the Dataset(First Difference): To detrend the data first difference was taken and corresonding density plots of the variables are provided below.Visual inspection proposes that first difference of a given variable follows a Laplace distribution.However two density plots suggests the underlying distribution is multimodal.These are prime candidates for correlation analysis.Multimodal distribution suggests multiple stochastic processes as the underlying source so in correlation analysis there could be combination of different stochastic processes coinciding with each other which in theory would create different correlation patterns in different time intervals.'TUKAS' and 'ALBRK are the stocks that yield multinomal distribution

In [232]:
df_dif=df.diff()
df_dif=df_dif.fillna(method='bfill')
df_dif_num=df_dif
for j in range(0,10):
    fig, axes = plt.subplots(3,2, figsize=(30, 10))
    ax = axes.flatten()
    for i, col in enumerate(df_dif_num.columns[j*6:(j+1)*6]):
        sns.kdeplot(df_dif_num[col], ax=ax[i]) 
        ax[i].set_title(col)
    plt.show()

Histogram of Negative and Positive values: A histogram depicting number of positive,negative and nill values was constructed for each variable.Discrepencies between positive and negative differences were searched for.No valuable visual evidence that suggests there is any stocks that favors a certain direction was obtained.

In [233]:
df_dif[df_dif > 0] = 1
df_dif[df_dif < 0] = -1
df_dif[df_dif == 0] = 0
for j in range(0,6):
    fig, axes = plt.subplots(2,5, figsize=(15, 5))
    ax = axes.flatten()
    for i, col in enumerate(df_dif.columns[j*10:(j+1)*10]):
        sns.histplot(df_dif[col], ax=ax[i]) 
        ax[i].set_title(col)
    #fig.tight_layout(w_pad=6, h_pad=4)
    plt.show()

Plots for first difference and Second difference: First and second difference plots for the variables was constructed.As it can be seen, there is a steep spike in difference between 2013 and 2014.This date is to be determined later and used in google trend analysis.Also higher vaolatility of prices can be observed after 2017.

In [234]:
for i in range(20):
    df.diff().iloc[:,i*3:(i+1)*3].plot(figsize=(20,10), linewidth=2, fontsize=20)
    plt.xlabel('date', fontsize=20)
In [235]:
df_dif_2=df.diff().diff()
df_dif_2=df_dif_2.fillna(method='bfill')
for i in range(20):
    df.diff().diff().iloc[:,i*3:(i+1)*3].plot(figsize=(20,10), linewidth=0.5, fontsize=20)
    plt.xlabel('date', fontsize=20)

Stationary Tests: Augmented Dickey Fuller Test was implemented for first difference,second difference and raw data.Results are provided below.If our p value is smaller than 0.05 we reject the null hypothesis which means our series are stationary H0:The series in question are non stationary H1:The series in question are stationary The results of the ADF test suggests even under second difference stationarity assumption is under jeopardy.This puts all the analysis obtained in this hw in a vague position.

In [236]:
#Statinary test for first differences
from statsmodels.tsa.stattools import adfuller
first_difference_stationary_test=[]
for i in range(df_dif.values.shape[1]):
    result=adfuller(df_dif.values[i])
    first_difference_stationary_test.append(result)
    print('Column no: %f' %i)
    print('ADF Statistic: %f' % result[0])
    print('p-value: %f' % result[1])
    print('Critical Values:')
    for key, value in result[4].items():
        print('\t%s: %.3f' % (key, value))
Column no: 0.000000
ADF Statistic: -8.099270
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column no: 1.000000
ADF Statistic: -8.099270
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column no: 2.000000
ADF Statistic: -4.216730
p-value: 0.000617
Critical Values:
	1%: -3.560
	5%: -2.918
	10%: -2.597
Column no: 3.000000
ADF Statistic: -4.077874
p-value: 0.001053
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 4.000000
ADF Statistic: -5.995265
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column no: 5.000000
ADF Statistic: -8.936146
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column no: 6.000000
ADF Statistic: -3.319504
p-value: 0.014029
Critical Values:
	1%: -3.551
	5%: -2.914
	10%: -2.595
Column no: 7.000000
ADF Statistic: -6.152137
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column no: 8.000000
ADF Statistic: -9.456608
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column no: 9.000000
ADF Statistic: -8.314044
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column no: 10.000000
ADF Statistic: -8.523931
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column no: 11.000000
ADF Statistic: -8.244489
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column no: 12.000000
ADF Statistic: -7.300131
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column no: 13.000000
ADF Statistic: -1.769749
p-value: 0.395580
Critical Values:
	1%: -3.575
	5%: -2.924
	10%: -2.600
Column no: 14.000000
ADF Statistic: -6.793439
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column no: 15.000000
ADF Statistic: -6.826620
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 16.000000
ADF Statistic: -1.511662
p-value: 0.527735
Critical Values:
	1%: -3.575
	5%: -2.924
	10%: -2.600
Column no: 17.000000
ADF Statistic: -7.831539
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column no: 18.000000
ADF Statistic: -9.016590
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column no: 19.000000
ADF Statistic: -4.174619
p-value: 0.000727
Critical Values:
	1%: -3.568
	5%: -2.921
	10%: -2.599
Column no: 20.000000
ADF Statistic: -6.712533
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column no: 21.000000
ADF Statistic: -7.684404
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column no: 22.000000
ADF Statistic: -6.917474
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column no: 23.000000
ADF Statistic: -3.603114
p-value: 0.005702
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 24.000000
ADF Statistic: -10.687146
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column no: 25.000000
ADF Statistic: -7.601218
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 26.000000
ADF Statistic: -9.366595
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column no: 27.000000
ADF Statistic: -7.434416
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column no: 28.000000
ADF Statistic: -7.008647
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column no: 29.000000
ADF Statistic: -5.593441
p-value: 0.000001
Critical Values:
	1%: -3.551
	5%: -2.914
	10%: -2.595
Column no: 30.000000
ADF Statistic: -7.245293
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column no: 31.000000
ADF Statistic: -6.770616
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column no: 32.000000
ADF Statistic: -6.574993
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 33.000000
ADF Statistic: -8.880827
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column no: 34.000000
ADF Statistic: -4.176622
p-value: 0.000722
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 35.000000
ADF Statistic: -9.334043
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column no: 36.000000
ADF Statistic: -8.918422
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column no: 37.000000
ADF Statistic: -6.661370
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 38.000000
ADF Statistic: -5.499863
p-value: 0.000002
Critical Values:
	1%: -3.551
	5%: -2.914
	10%: -2.595
Column no: 39.000000
ADF Statistic: -7.318254
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column no: 40.000000
ADF Statistic: -1.919012
p-value: 0.323171
Critical Values:
	1%: -3.563
	5%: -2.919
	10%: -2.597
Column no: 41.000000
ADF Statistic: -8.188024
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column no: 42.000000
ADF Statistic: -8.234837
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column no: 43.000000
ADF Statistic: -4.779133
p-value: 0.000060
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 44.000000
ADF Statistic: -8.717417
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column no: 45.000000
ADF Statistic: -8.248489
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column no: 46.000000
ADF Statistic: -4.014292
p-value: 0.001337
Critical Values:
	1%: -3.558
	5%: -2.917
	10%: -2.596
Column no: 47.000000
ADF Statistic: -2.285509
p-value: 0.176684
Critical Values:
	1%: -3.555
	5%: -2.916
	10%: -2.596
Column no: 48.000000
ADF Statistic: -8.361810
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column no: 49.000000
ADF Statistic: -7.769269
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column no: 50.000000
ADF Statistic: -7.310824
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column no: 51.000000
ADF Statistic: -5.472582
p-value: 0.000002
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column no: 52.000000
ADF Statistic: -8.695394
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column no: 53.000000
ADF Statistic: -8.750652
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column no: 54.000000
ADF Statistic: -7.443134
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column no: 55.000000
ADF Statistic: -8.205769
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column no: 56.000000
ADF Statistic: -2.824305
p-value: 0.054885
Critical Values:
	1%: -3.551
	5%: -2.914
	10%: -2.595
Column no: 57.000000
ADF Statistic: -3.037337
p-value: 0.031552
Critical Values:
	1%: -3.555
	5%: -2.916
	10%: -2.596
Column no: 58.000000
ADF Statistic: -7.081131
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column no: 59.000000
ADF Statistic: -7.180547
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
In [237]:
#Statinary test for second differences
from statsmodels.tsa.stattools import adfuller
second_difference_stationary_test=[]
for i in range(df_dif_2.shape[1]):
    result=adfuller(df_dif_2.values[i])
    second_difference_stationary_test.append(result)
    print('Column2 no: %f' %i)
    print('ADF Statistic: %f' % result[0])
    print('p-value: %f' % result[1])
    print('Critical Values:')
    for key, value in result[4].items():
        print('\t%s: %.3f' % (key, value))
Column2 no: 0.000000
ADF Statistic: -7.500367
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column2 no: 1.000000
ADF Statistic: -7.500367
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column2 no: 2.000000
ADF Statistic: -7.500367
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column2 no: 3.000000
ADF Statistic: -3.651316
p-value: 0.004852
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column2 no: 4.000000
ADF Statistic: -5.505116
p-value: 0.000002
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column2 no: 5.000000
ADF Statistic: -7.626145
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column2 no: 6.000000
ADF Statistic: -9.723641
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column2 no: 7.000000
ADF Statistic: -5.739843
p-value: 0.000001
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column2 no: 8.000000
ADF Statistic: -7.857847
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column2 no: 9.000000
ADF Statistic: -8.194446
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column2 no: 10.000000
ADF Statistic: -9.136396
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column2 no: 11.000000
ADF Statistic: -5.999902
p-value: 0.000000
Critical Values:
	1%: -3.551
	5%: -2.914
	10%: -2.595
Column2 no: 12.000000
ADF Statistic: -9.886147
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column2 no: 13.000000
ADF Statistic: -5.778694
p-value: 0.000001
Critical Values:
	1%: -3.551
	5%: -2.914
	10%: -2.595
Column2 no: 14.000000
ADF Statistic: -6.969938
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column2 no: 15.000000
ADF Statistic: -7.708578
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column2 no: 16.000000
ADF Statistic: -1.458240
p-value: 0.554059
Critical Values:
	1%: -3.571
	5%: -2.923
	10%: -2.599
Column2 no: 17.000000
ADF Statistic: -2.727714
p-value: 0.069364
Critical Values:
	1%: -3.566
	5%: -2.920
	10%: -2.598
Column2 no: 18.000000
ADF Statistic: -8.394921
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column2 no: 19.000000
ADF Statistic: -7.917404
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column2 no: 20.000000
ADF Statistic: -6.867701
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column2 no: 21.000000
ADF Statistic: -8.623271
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column2 no: 22.000000
ADF Statistic: -7.760141
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column2 no: 23.000000
ADF Statistic: -7.131561
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column2 no: 24.000000
ADF Statistic: -3.888958
p-value: 0.002118
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column2 no: 25.000000
ADF Statistic: -11.458508
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column2 no: 26.000000
ADF Statistic: -10.361013
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column2 no: 27.000000
ADF Statistic: -7.690150
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column2 no: 28.000000
ADF Statistic: -8.239476
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column2 no: 29.000000
ADF Statistic: -4.167237
p-value: 0.000748
Critical Values:
	1%: -3.551
	5%: -2.914
	10%: -2.595
Column2 no: 30.000000
ADF Statistic: -4.880950
p-value: 0.000038
Critical Values:
	1%: -3.555
	5%: -2.916
	10%: -2.596
Column2 no: 31.000000
ADF Statistic: -4.816709
p-value: 0.000051
Critical Values:
	1%: -3.553
	5%: -2.915
	10%: -2.595
Column2 no: 32.000000
ADF Statistic: -6.927399
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column2 no: 33.000000
ADF Statistic: -7.618179
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column2 no: 34.000000
ADF Statistic: -4.672030
p-value: 0.000095
Critical Values:
	1%: -3.575
	5%: -2.924
	10%: -2.600
Column2 no: 35.000000
ADF Statistic: -8.484573
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column2 no: 36.000000
ADF Statistic: -9.206054
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column2 no: 37.000000
ADF Statistic: -8.877339
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column2 no: 38.000000
ADF Statistic: -2.696517
p-value: 0.074643
Critical Values:
	1%: -3.558
	5%: -2.917
	10%: -2.596
Column2 no: 39.000000
ADF Statistic: -6.197472
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column2 no: 40.000000
ADF Statistic: -6.154092
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column2 no: 41.000000
ADF Statistic: -4.485382
p-value: 0.000209
Critical Values:
	1%: -3.563
	5%: -2.919
	10%: -2.597
Column2 no: 42.000000
ADF Statistic: -7.575952
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column2 no: 43.000000
ADF Statistic: -4.154510
p-value: 0.000786
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column2 no: 44.000000
ADF Statistic: -9.957188
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column2 no: 45.000000
ADF Statistic: -3.569568
p-value: 0.006370
Critical Values:
	1%: -3.555
	5%: -2.916
	10%: -2.596
Column2 no: 46.000000
ADF Statistic: -6.972989
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column2 no: 47.000000
ADF Statistic: -2.108435
p-value: 0.241136
Critical Values:
	1%: -3.555
	5%: -2.916
	10%: -2.596
Column2 no: 48.000000
ADF Statistic: -4.754949
p-value: 0.000066
Critical Values:
	1%: -3.551
	5%: -2.914
	10%: -2.595
Column2 no: 49.000000
ADF Statistic: -8.947385
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column2 no: 50.000000
ADF Statistic: -7.845440
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column2 no: 51.000000
ADF Statistic: -6.035620
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column2 no: 52.000000
ADF Statistic: -7.814204
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column2 no: 53.000000
ADF Statistic: -8.281382
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column2 no: 54.000000
ADF Statistic: -3.385010
p-value: 0.011481
Critical Values:
	1%: -3.553
	5%: -2.915
	10%: -2.595
Column2 no: 55.000000
ADF Statistic: -8.184586
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column2 no: 56.000000
ADF Statistic: -7.816258
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column2 no: 57.000000
ADF Statistic: -8.907242
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
Column2 no: 58.000000
ADF Statistic: -4.377272
p-value: 0.000326
Critical Values:
	1%: -3.551
	5%: -2.914
	10%: -2.595
Column2 no: 59.000000
ADF Statistic: -6.602987
p-value: 0.000000
Critical Values:
	1%: -3.546
	5%: -2.912
	10%: -2.594
In [238]:
#Statinary test for data
from statsmodels.tsa.stattools import adfuller
stationary_test=[]
for i in range(df.values.shape[1]):
    result=adfuller(df.values[i])
    stationary_test.append(result)
    print('Column no: %f' %i)
    print('ADF Statistic: %f' % result[0])
    print('p-value: %f' % result[1])
    print('Critical Values:')
    for key, value in result[4].items():
        print('\t%s: %.3f' % (key, value))
Column no: 0.000000
ADF Statistic: -7.479514
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 1.000000
ADF Statistic: -7.484154
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 2.000000
ADF Statistic: -7.482057
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 3.000000
ADF Statistic: -7.487830
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 4.000000
ADF Statistic: -7.490618
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 5.000000
ADF Statistic: -7.498728
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 6.000000
ADF Statistic: -7.501188
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 7.000000
ADF Statistic: -7.497547
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 8.000000
ADF Statistic: -7.505938
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 9.000000
ADF Statistic: -7.501499
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 10.000000
ADF Statistic: -7.487867
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 11.000000
ADF Statistic: -7.481673
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 12.000000
ADF Statistic: -7.475956
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 13.000000
ADF Statistic: -7.492711
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 14.000000
ADF Statistic: -7.483697
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 15.000000
ADF Statistic: -7.489679
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 16.000000
ADF Statistic: -7.504898
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 17.000000
ADF Statistic: -7.494967
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 18.000000
ADF Statistic: -7.491407
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 19.000000
ADF Statistic: -7.501281
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 20.000000
ADF Statistic: -7.487235
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 21.000000
ADF Statistic: -7.488775
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 22.000000
ADF Statistic: -7.475103
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 23.000000
ADF Statistic: -7.478036
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 24.000000
ADF Statistic: -7.472849
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 25.000000
ADF Statistic: -7.471032
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 26.000000
ADF Statistic: -7.476766
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 27.000000
ADF Statistic: -7.488429
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 28.000000
ADF Statistic: -7.473660
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 29.000000
ADF Statistic: -7.480347
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 30.000000
ADF Statistic: -7.471111
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 31.000000
ADF Statistic: -7.491417
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 32.000000
ADF Statistic: -7.484816
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 33.000000
ADF Statistic: -7.484799
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 34.000000
ADF Statistic: -7.479580
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 35.000000
ADF Statistic: -7.479928
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 36.000000
ADF Statistic: -7.487338
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 37.000000
ADF Statistic: -7.473418
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 38.000000
ADF Statistic: -7.474487
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 39.000000
ADF Statistic: -7.463117
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 40.000000
ADF Statistic: -7.468418
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 41.000000
ADF Statistic: -7.485016
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 42.000000
ADF Statistic: -7.479275
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 43.000000
ADF Statistic: -7.469214
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 44.000000
ADF Statistic: -7.474453
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 45.000000
ADF Statistic: -7.472985
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 46.000000
ADF Statistic: -7.464098
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 47.000000
ADF Statistic: -7.460705
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 48.000000
ADF Statistic: -7.462293
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 49.000000
ADF Statistic: -7.459727
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 50.000000
ADF Statistic: -7.458120
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 51.000000
ADF Statistic: -7.463083
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 52.000000
ADF Statistic: -7.467293
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 53.000000
ADF Statistic: -7.456245
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 54.000000
ADF Statistic: -7.465805
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 55.000000
ADF Statistic: -7.470211
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 56.000000
ADF Statistic: -7.467319
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 57.000000
ADF Statistic: -7.467453
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 58.000000
ADF Statistic: -7.474889
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594
Column no: 59.000000
ADF Statistic: -7.464989
p-value: 0.000000
Critical Values:
	1%: -3.548
	5%: -2.913
	10%: -2.594

Autocorrelation Test: Autocorrelation were tested for the dataset.No Autocorrelation was detected.Corresponding charts are provide below.

In [239]:
#Autocorrelation
import statsmodels as sm
from statsmodels.graphics import tsaplots
auto_test=pd.DataFrame(data=np.zeros((6,60)),columns=df.columns)
for i in range(df.values.shape[1]):
    result=sm.tsa.stattools.acf(df.values[i],nlags=5)
    auto_test[df.columns[i]]=result
for j in range(0,6):
    fig, axes = plt.subplots(2,5, figsize=(15, 5))
    ax = axes.flatten()
    for i, col in enumerate(auto_test.columns[j*10:(j+1)*10]):
        sm.graphics.tsaplots.plot_acf(auto_test[col], ax=ax[i]) 
        ax[i].set_title(col)
    fig.tight_layout(w_pad=6, h_pad=4)
    plt.show()
In [240]:
#Autocorrelation difference 1
import statsmodels as sm
from statsmodels.graphics import tsaplots
auto_test1=pd.DataFrame(data=np.zeros((6,60)),columns=df.columns)
for i in range(df.values.shape[1]):
    result=sm.tsa.stattools.acf(df_dif.values[i],nlags=5)
    auto_test1[df.columns[i]]=result
for j in range(0,6):
    fig, axes = plt.subplots(2,5, figsize=(15, 5))
    ax = axes.flatten()
    for i, col in enumerate(auto_test.columns[j*10:(j+1)*10]):
        sm.graphics.tsaplots.plot_acf(auto_test1[col], ax=ax[i]) 
        ax[i].set_title(col)
    fig.tight_layout(w_pad=6, h_pad=4)
    plt.show()

Summary Statistics for First difference: Summary statistics for first difference,positive values of first difference and negative values of the first difference are provided.No stock of unsually properties were observed(mean higher and lower than 0).All of the stocks seems to have similar variance and mean 0.Skewness,kurtosis,mode,median,standard deviation and variance for every stock was measured.No seems to stick out.For the correlation analysis stock with highest mean will be used.('ISYAT')

In [245]:
#Summary statistics for first difference
data=df.diff()
statistics_positive=data[data>0]
statistics_positive.describe()
Out[245]:
AEFES AKBNK AKSA AKSEN ALARK ALBRK ANACM ARCLK ASELS ASUZU ... TTKOM TUKAS TUPRS USAK VAKBN VESTL YATAS YKBNK YUNSA ZOREN
count 18144.000000 18594.000000 17442.000000 14070.000000 14636.000000 11341.000000 13162.000000 17944.000000 17194.000000 16983.000000 ... 16656.000000 12119.000000 19413.000000 11599.000000 16946.000000 16303.000000 14887.000000 15517.000000 14877.000000 12066.000000
mean 0.002707 0.002222 0.001703 0.002982 0.002516 0.009556 0.002669 0.001989 0.000935 0.001958 ... 0.002721 0.002809 0.001250 0.003858 0.002441 0.001991 0.001331 0.002802 0.002395 0.004323
std 0.007720 0.006448 0.002719 0.008132 0.005679 0.006165 0.003865 0.003884 0.001895 0.004641 ... 0.007582 0.002822 0.002360 0.004207 0.006930 0.003166 0.002288 0.007681 0.005221 0.005006
min 0.000323 0.000825 0.000212 0.001927 0.000996 0.006698 0.001256 0.000341 0.000086 0.000137 ... 0.001143 0.001898 0.000047 0.001197 0.001187 0.000688 0.000262 0.001440 0.000514 0.002865
25% 0.001326 0.000890 0.000556 0.001927 0.001167 0.007643 0.001513 0.000757 0.000214 0.000654 ... 0.001361 0.001898 0.000486 0.002937 0.001279 0.000688 0.000272 0.001617 0.000997 0.002906
50% 0.001645 0.001693 0.000999 0.001927 0.002077 0.007986 0.001571 0.001676 0.000518 0.000988 ... 0.002150 0.001898 0.000729 0.002937 0.001319 0.000688 0.000534 0.001642 0.001102 0.003357
75% 0.003220 0.002670 0.002189 0.003854 0.002675 0.008587 0.002741 0.002270 0.000855 0.001963 ... 0.002721 0.001898 0.001436 0.002973 0.002598 0.002063 0.001602 0.003209 0.002635 0.004093
max 0.977849 0.842765 0.228543 0.932563 0.633515 0.145985 0.364877 0.468312 0.116361 0.443275 ... 0.920502 0.085389 0.252894 0.367625 0.870774 0.161623 0.058887 0.929939 0.509551 0.443448

8 rows × 60 columns

In [246]:
statistics=df.diff().describe()
statistics
Out[246]:
AEFES AKBNK AKSA AKSEN ALARK ALBRK ANACM ARCLK ASELS ASUZU ... TTKOM TUKAS TUPRS USAK VAKBN VESTL YATAS YKBNK YUNSA ZOREN
count 50011.000000 50011.000000 50011.000000 50011.000000 50011.000000 50011.000000 50011.000000 50011.000000 50011.000000 50011.000000 ... 50011.000000 50011.000000 50011.000000 5.001100e+04 50011.000000 50011.000000 50011.000000 50011.000000 50011.000000 50011.000000
mean -0.000001 0.000005 0.000010 -0.000005 0.000010 0.000001 0.000010 0.000010 0.000006 0.000007 ... 0.000004 0.000013 0.000015 8.555974e-08 0.000003 0.000011 0.000009 0.000001 0.000004 0.000003
std 0.006757 0.005733 0.002549 0.006501 0.004625 0.007496 0.003358 0.003520 0.001846 0.003756 ... 0.006516 0.002702 0.002348 3.843752e-03 0.005953 0.002836 0.001974 0.006226 0.004380 0.004523
min -0.899968 -0.786785 -0.172514 -0.934489 -0.599767 -0.128811 -0.363564 -0.403092 -0.159728 -0.355015 ... -0.920502 -0.079696 -0.251678 -3.646880e-01 -0.836453 -0.160935 -0.049791 -0.851086 -0.544692 -0.443448
25% -0.001364 -0.000934 -0.000589 -0.001927 -0.001138 0.000000 -0.001342 -0.001135 -0.000415 -0.000818 ... -0.001361 -0.001898 -0.000530 0.000000e+00 -0.001293 -0.000688 -0.000272 -0.001617 -0.000997 -0.002865
50% 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000 0.000000e+00 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
75% 0.001347 0.000934 0.000589 0.001927 0.001138 0.000000 0.001314 0.001135 0.000312 0.000661 ... 0.001361 0.000000 0.000531 0.000000e+00 0.001279 0.000688 0.000262 0.001617 0.000966 0.000000
max 0.977849 0.842765 0.228543 0.932563 0.633515 0.145985 0.364877 0.468312 0.116361 0.443275 ... 0.920502 0.085389 0.252894 3.676252e-01 0.870774 0.161623 0.058887 0.929939 0.509551 0.443448

8 rows × 60 columns

In [247]:
data=df.diff()
statistics_negative=data[data<0]
statistics_negative.describe()
Out[247]:
AEFES AKBNK AKSA AKSEN ALARK ALBRK ANACM ARCLK ASELS ASUZU ... TTKOM TUKAS TUPRS USAK VAKBN VESTL YATAS YKBNK YUNSA ZOREN
count 18360.000000 18521.000000 18434.000000 14764.000000 15314.000000 11514.000000 13450.000000 18260.000000 18319.000000 19401.000000 ... 16947.000000 12643.000000 19459.000000 12070.000000 17120.000000 18266.000000 15507.000000 15643.000000 16053.000000 12650.000000
mean -0.002679 -0.002216 -0.001585 -0.002860 -0.002372 -0.009407 -0.002574 -0.001927 -0.000860 -0.001696 ... -0.002664 -0.002642 -0.001209 -0.003707 -0.002409 -0.001747 -0.001248 -0.002776 -0.002207 -0.004111
std 0.007146 0.006094 0.002318 0.007970 0.005230 0.005310 0.003703 0.003392 0.002091 0.003359 ... 0.007377 0.002591 0.002365 0.004020 0.006656 0.002548 0.002069 0.007066 0.004932 0.004718
min -0.899968 -0.786785 -0.172514 -0.934489 -0.599767 -0.128811 -0.363564 -0.403092 -0.159728 -0.355015 ... -0.920502 -0.079696 -0.251678 -0.364688 -0.836453 -0.160935 -0.049791 -0.851086 -0.544692 -0.443448
25% -0.003220 -0.002616 -0.001984 -0.003854 -0.002675 -0.008587 -0.002741 -0.002191 -0.000853 -0.001963 ... -0.002721 -0.001898 -0.001325 -0.002973 -0.002598 -0.002063 -0.001340 -0.003209 -0.002173 -0.004093
50% -0.001642 -0.001683 -0.000999 -0.001927 -0.001394 -0.007986 -0.001542 -0.001676 -0.000428 -0.000962 ... -0.002150 -0.001898 -0.000718 -0.002937 -0.001319 -0.000688 -0.000534 -0.001642 -0.001081 -0.003357
75% -0.001326 -0.000890 -0.000549 -0.001927 -0.001167 -0.007643 -0.001513 -0.000757 -0.000214 -0.000654 ... -0.001361 -0.001898 -0.000486 -0.002937 -0.001279 -0.000688 -0.000272 -0.001617 -0.000997 -0.002906
max -0.000323 -0.000825 -0.000212 -0.001927 -0.000996 -0.003177 -0.001256 -0.000344 -0.000098 -0.000190 ... -0.001143 -0.001898 -0.000048 -0.002937 -0.001187 -0.000688 -0.000262 -0.001440 -0.000514 -0.002865

8 rows × 60 columns

In [248]:
statistics.iloc[1,:].idxmax()
Out[248]:
'ISYAT'
In [250]:
data.mode()
Out[250]:
AEFES AKBNK AKSA AKSEN ALARK ALBRK ANACM ARCLK ASELS ASUZU ... TTKOM TUKAS TUPRS USAK VAKBN VESTL YATAS YKBNK YUNSA ZOREN
0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

1 rows × 60 columns

In [251]:
data.median()
Out[251]:
AEFES    0.0
AKBNK    0.0
AKSA     0.0
AKSEN    0.0
ALARK    0.0
ALBRK    0.0
ANACM    0.0
ARCLK    0.0
ASELS    0.0
ASUZU    0.0
AYGAZ    0.0
BAGFS    0.0
BANVT    0.0
BRISA    0.0
CCOLA    0.0
CEMAS    0.0
ECILC    0.0
EREGL    0.0
FROTO    0.0
GARAN    0.0
GOODY    0.0
GUBRF    0.0
HALKB    0.0
ICBCT    0.0
ISCTR    0.0
ISDMR    0.0
ISFIN    0.0
ISYAT    0.0
KAREL    0.0
KARSN    0.0
KCHOL    0.0
KRDMB    0.0
KRDMD    0.0
MGROS    0.0
OTKAR    0.0
PARSN    0.0
PETKM    0.0
PGSUS    0.0
PRKME    0.0
SAHOL    0.0
SASA     0.0
SISE     0.0
SKBNK    0.0
SODA     0.0
TCELL    0.0
THYAO    0.0
TKFEN    0.0
TOASO    0.0
TRKCM    0.0
TSKB     0.0
TTKOM    0.0
TUKAS    0.0
TUPRS    0.0
USAK     0.0
VAKBN    0.0
VESTL    0.0
YATAS    0.0
YKBNK    0.0
YUNSA    0.0
ZOREN    0.0
dtype: float64
In [252]:
data.var()
Out[252]:
AEFES    0.000046
AKBNK    0.000033
AKSA     0.000006
AKSEN    0.000042
ALARK    0.000021
ALBRK    0.000056
ANACM    0.000011
ARCLK    0.000012
ASELS    0.000003
ASUZU    0.000014
AYGAZ    0.000012
BAGFS    0.000012
BANVT    0.000007
BRISA    0.000014
CCOLA    0.000036
CEMAS    0.000011
ECILC    0.000008
EREGL    0.000004
FROTO    0.000009
GARAN    0.000028
GOODY    0.000073
GUBRF    0.000027
HALKB    0.000041
ICBCT    0.000008
ISCTR    0.000029
ISDMR    0.000006
ISFIN    0.000005
ISYAT    0.000014
KAREL    0.000008
KARSN    0.000020
KCHOL    0.000016
KRDMB    0.000024
KRDMD    0.000008
MGROS    0.000031
OTKAR    0.000013
PARSN    0.000005
PETKM    0.000005
PGSUS    0.000010
PRKME    0.000031
SAHOL    0.000039
SASA     0.000006
SISE     0.000006
SKBNK    0.000043
SODA     0.000003
TCELL    0.000012
THYAO    0.000009
TKFEN    0.000005
TOASO    0.000008
TRKCM    0.000006
TSKB     0.000039
TTKOM    0.000042
TUKAS    0.000007
TUPRS    0.000006
USAK     0.000015
VAKBN    0.000035
VESTL    0.000008
YATAS    0.000004
YKBNK    0.000039
YUNSA    0.000019
ZOREN    0.000020
dtype: float64
In [253]:
data.std()
Out[253]:
AEFES    0.006757
AKBNK    0.005733
AKSA     0.002549
AKSEN    0.006501
ALARK    0.004625
ALBRK    0.007496
ANACM    0.003358
ARCLK    0.003520
ASELS    0.001846
ASUZU    0.003756
AYGAZ    0.003406
BAGFS    0.003530
BANVT    0.002713
BRISA    0.003807
CCOLA    0.006014
CEMAS    0.003252
ECILC    0.002779
EREGL    0.002114
FROTO    0.002977
GARAN    0.005280
GOODY    0.008538
GUBRF    0.005215
HALKB    0.006411
ICBCT    0.002740
ISCTR    0.005395
ISDMR    0.002459
ISFIN    0.002244
ISYAT    0.003785
KAREL    0.002781
KARSN    0.004477
KCHOL    0.004030
KRDMB    0.004923
KRDMD    0.002869
MGROS    0.005533
OTKAR    0.003643
PARSN    0.002145
PETKM    0.002228
PGSUS    0.003188
PRKME    0.005564
SAHOL    0.006225
SASA     0.002383
SISE     0.002451
SKBNK    0.006570
SODA     0.001810
TCELL    0.003500
THYAO    0.003082
TKFEN    0.002324
TOASO    0.002876
TRKCM    0.002469
TSKB     0.006262
TTKOM    0.006516
TUKAS    0.002702
TUPRS    0.002348
USAK     0.003844
VAKBN    0.005953
VESTL    0.002836
YATAS    0.001974
YKBNK    0.006226
YUNSA    0.004380
ZOREN    0.004523
dtype: float64
In [254]:
#Skews to right when positive,to left when negative
data.skew()
Out[254]:
AEFES     13.342564
AKBNK     11.810276
AKSA       8.425760
AKSEN     -0.414508
ALARK      7.871218
ALBRK      0.611683
ANACM      0.473930
ARCLK     17.144157
ASELS    -12.528670
ASUZU     17.781964
AYGAZ     29.146739
BAGFS   -115.506753
BANVT      2.347752
BRISA      0.132926
CCOLA      6.408829
CEMAS    -11.343028
ECILC      6.846357
EREGL      0.318090
FROTO      4.478731
GARAN     10.408979
GOODY     -0.005829
GUBRF      0.454105
HALKB     11.204455
ICBCT    -10.998863
ISCTR     18.982775
ISDMR      6.174451
ISFIN    -15.173323
ISYAT     18.462930
KAREL      0.974498
KARSN      0.494337
KCHOL      9.006304
KRDMB      8.366308
KRDMD      5.092583
MGROS      2.605187
OTKAR     10.392063
PARSN     -6.360402
PETKM      0.308251
PGSUS     -0.061565
PRKME     -0.707580
SAHOL      8.260224
SASA       1.059227
SISE      20.688893
SKBNK      0.236488
SODA       5.524513
TCELL     52.991215
THYAO      0.299846
TKFEN      4.089045
TOASO     14.968348
TRKCM     35.805049
TSKB       0.721176
TTKOM      0.487506
TUKAS      0.195159
TUPRS      0.176209
USAK       0.360454
VAKBN      7.097173
VESTL      1.489574
YATAS      1.009304
YKBNK     15.532139
YUNSA     -5.844550
ZOREN     -0.067330
dtype: float64
In [131]:
#Narrower bell shape when negative,wider bell when positive
df_dif_num.kurtosis()
Out[131]:
AEFES   -1.629918
AKBNK   -1.652585
AKSA    -1.604290
AKSEN   -1.264739
ALARK   -1.329400
ALBRK   -0.811784
ANACM   -1.120550
ARCLK   -1.618506
ASELS   -1.589496
ASUZU   -1.615060
AYGAZ   -1.568923
BAGFS   -1.509655
BANVT   -1.495501
BRISA   -1.505834
CCOLA   -1.709216
CEMAS   -0.791281
ECILC   -1.198456
EREGL   -1.537100
FROTO   -1.709151
GARAN   -1.660417
GOODY   -1.475634
GUBRF   -1.416043
HALKB   -1.560887
ICBCT   -1.072661
ISCTR   -1.524183
ISDMR    2.618904
ISFIN   -0.607593
ISYAT   -0.021914
KAREL   -1.261153
KARSN   -0.833144
KCHOL   -1.655714
KRDMB   -1.165188
KRDMD   -1.101239
MGROS   -1.631327
OTKAR   -1.698935
PARSN   -1.492878
PETKM   -1.393785
PGSUS   -1.596284
PRKME   -1.285143
SAHOL   -1.632533
SASA    -1.317361
SISE    -1.416988
SKBNK   -0.817996
SODA    -1.382325
TCELL   -1.547758
THYAO   -1.645781
TKFEN   -1.629206
TOASO   -1.644768
TRKCM   -1.242317
TSKB    -0.817556
TTKOM   -1.511603
TUKAS   -0.979836
TUPRS   -1.713499
USAK    -0.886661
VAKBN   -1.531930
VESTL   -1.546372
YATAS   -1.353930
YKBNK   -1.395036
YUNSA   -1.380651
ZOREN   -0.975972
dtype: float64

Correlation analysis: Three interesting stocks were selected(ALBRK,TUKAS,ISYAT).Their correlations will be calculated and days with highest correlation change and highest correlation will be provided for google trends analysis.Instead of 15 minutes data daily high prices were used in order to better grasp the data.

In [255]:
#Correlation of selected stock
from datetime import datetime,date,timedelta
df_daily=df.resample('1d').max()
df_daily=df_daily.dropna()
cor=df_daily['ALBRK'].rolling(30).corr(df_daily['TUKAS'])
print('Highest correlation window',cor.idxmax(),cor.idxmax()+timedelta(days=30))
print('Lowest correlation window',cor.idxmin(),cor.idxmin()+timedelta(days=30))
print('Highest correlation increase',cor.diff().idxmax())
print('Highest correlation decrease',cor.diff().idxmin())
cor.plot(figsize=(20,10), linewidth=2, fontsize=20)
plt.xlabel('date', fontsize=20)
plt.show()
sns.kdeplot(data=cor).set_title('Correlation distribution')
plt.show()
sns.histplot(data=cor).set_title('Correlation histogram')
Highest correlation window 2013-09-09 00:00:00+00:00 2013-10-09 00:00:00+00:00
Lowest correlation window 2015-07-23 00:00:00+00:00 2015-08-22 00:00:00+00:00
Highest correlation increase 2012-12-05 00:00:00+00:00
Highest correlation decrease 2015-09-18 00:00:00+00:00
Out[255]:
Text(0.5, 1.0, 'Correlation histogram')
In [160]:
from datetime import datetime,date,timedelta
df_daily=df.resample('1d').max()
df_daily=df_daily.dropna()
cor=df_daily['ALBRK'].rolling(30).corr(df_daily['ISYAT'])
print('Highest correlation window',cor.idxmax(),cor.idxmax()+timedelta(days=30))
print('Lowest correlation window',cor.idxmin(),cor.idxmin()+timedelta(days=30))
print('Highest correlation increase',cor.diff().idxmax())
print('Highest correlation decrease',cor.diff().idxmin())
cor.plot(figsize=(20,10), linewidth=2, fontsize=20)
plt.xlabel('date', fontsize=20)
plt.show()
sns.kdeplot(data=cor).set_title('Correlation distribution')
plt.show()
sns.histplot(data=cor).set_title('Correlation histogram')
Highest correlation window 2013-11-12 00:00:00+00:00 2013-12-12 00:00:00+00:00
Lowest correlation window 2014-04-18 00:00:00+00:00 2014-05-18 00:00:00+00:00
Highest correlation increase 2014-05-13 00:00:00+00:00
Highest correlation decrease 2014-03-28 00:00:00+00:00
Out[160]:
Text(0.5, 1.0, 'Correlation histogram')
In [256]:
from datetime import datetime,date,timedelta
df_daily=df.resample('1d').max()
df_daily=df_daily.dropna()
cor=df_daily['TUKAS'].rolling(30).corr(df_daily['ISYAT'])
print('Highest correlation window',cor.idxmax(),cor.idxmax()+timedelta(days=30))
print('Lowest correlation window',cor.idxmin(),cor.idxmin()+timedelta(days=30))
print('Highest correlation increase',cor.diff().idxmax())
print('Highest correlation decrease',cor.diff().idxmin())
cor.plot(figsize=(20,10), linewidth=2, fontsize=20)
plt.xlabel('date', fontsize=20)
plt.show()
sns.kdeplot(data=cor).set_title('Correlation distribution')
plt.show()
sns.histplot(data=cor).set_title('Correlation histogram')
Highest correlation window 2019-02-27 00:00:00+00:00 2019-03-29 00:00:00+00:00
Lowest correlation window 2016-02-02 00:00:00+00:00 2016-03-03 00:00:00+00:00
Highest correlation increase 2012-12-05 00:00:00+00:00
Highest correlation decrease 2017-03-29 00:00:00+00:00
Out[256]:
Text(0.5, 1.0, 'Correlation histogram')

Principal Component Analysis: PCA was implemented on the data.Principal comonents and relative variation explained by them are provided.Data points were transformed to the new coordinate system created by PCA.This example is not suitable for PCA analysis.The data in question is not stationary and it is likely to have non-linear relations between variables.PCA rotates the axis of the coordinate system so that it encapsulates the highest variance possible to the first variable and least possible to the last.However if there is stationarity in the data it learns to compress the information on the given segment instead of the whole process.PCA assumes there is linear relationships between variables and consequently non linear relationships are dropped out with the less important axises and this causes loss of information.In this dataset PCA learns the wrong representation due to stationarity and causes loss of information due to non linear relations.This kind of data,Multivariate,unstationary time series with non linear relationships between variables are not suitable to PCA aplications.PCA should not be implemented on it.After getting rid of stationarity CCA or autoencoders may be implemented.Nearly all of the variance seems to be explained by the few latent components.This has two possible explanations.The data in question is random and does not posess valuable information or due to non linearity high variation latent components cannot be highlighted by PCA.

In [257]:
#Principal componenet analysis
from sklearn.decomposition import PCA
pca = PCA()
pca.fit(df.values)
PCA()
plt.hist(pca.explained_variance_ratio_)
x=pca.transform(df.values)#principal components values
print(pca.explained_variance_ratio_)
print(pca.singular_values_)
[6.58863790e-01 1.07485607e-01 8.43247899e-02 3.53499550e-02
 2.74745561e-02 1.72708027e-02 1.03153939e-02 8.63891371e-03
 6.48636013e-03 5.24644083e-03 4.52317389e-03 4.25063885e-03
 3.66752204e-03 2.47334962e-03 2.31441953e-03 2.03373472e-03
 1.96175717e-03 1.71469848e-03 1.39712113e-03 1.08722634e-03
 1.03570941e-03 9.99550260e-04 8.93125883e-04 7.81297661e-04
 7.39177873e-04 6.88579933e-04 6.47041397e-04 6.18726677e-04
 5.67825161e-04 5.32537973e-04 4.90834089e-04 4.02587758e-04
 3.79261767e-04 3.67149030e-04 3.59495875e-04 3.05299704e-04
 2.79702928e-04 2.63909073e-04 2.49743125e-04 2.41852386e-04
 2.21878632e-04 1.89886327e-04 1.83216237e-04 1.78608571e-04
 1.59028334e-04 1.56201884e-04 1.36731662e-04 1.30746877e-04
 1.14151929e-04 1.09685630e-04 1.02515131e-04 9.65884054e-05
 8.35535741e-05 7.45783654e-05 7.09292045e-05 6.99109818e-05
 6.25604504e-05 5.88397463e-05 4.01747997e-05 3.65543704e-05]
[246.0554475   99.38259242  88.0263711   56.99406028  50.24588573
  39.83744036  30.78775864  28.17505241  24.41381748  21.95672457
  20.38716353  19.76342765  18.3578379   15.07571503  14.58331332
  13.67043442  13.42634458  12.55246579  11.33058139   9.99528418
   9.75560314   9.58379427   9.05923451   8.47312135   8.24156415
   7.9544901    7.71083124   7.54022986   7.22341357   6.99536621
   6.71587375   6.08226652   5.90343378   5.80839786   5.74754156
   5.29661439   5.06971646   4.92450208   4.79051204   4.7142255
   4.51536544   4.17716867   4.10314764   4.05122452   3.82271938
   3.78859602   3.54462022   3.46617772   3.23874527   3.17475366
   3.06922799   2.9791863    2.77087976   2.61783093   2.55298174
   2.53459086   2.3976462    2.32525486   1.92137451   1.83275671]
In [136]:
df_dif_num.head()
Out[136]:
AEFES AKBNK AKSA AKSEN ALARK ALBRK ANACM ARCLK ASELS ASUZU ... TTKOM TUKAS TUPRS USAK VAKBN VESTL YATAS YKBNK YUNSA ZOREN
timestamp
2012-09-17 06:45:00+00:00 0.0 -1.0 -1.0 -1.0 -1.0 -1.0 0.0 -1.0 0.0 1.0 ... -1.0 0.0 -1.0 0.0 -1.0 0.0 1.0 -1.0 -1.0 0.0
2012-09-17 07:00:00+00:00 0.0 -1.0 -1.0 -1.0 -1.0 -1.0 0.0 -1.0 0.0 1.0 ... -1.0 0.0 -1.0 0.0 -1.0 0.0 1.0 -1.0 -1.0 0.0
2012-09-17 07:15:00+00:00 0.0 1.0 1.0 0.0 1.0 1.0 1.0 1.0 -1.0 -1.0 ... 0.0 1.0 -1.0 1.0 -1.0 1.0 0.0 0.0 1.0 0.0
2012-09-17 07:30:00+00:00 0.0 -1.0 0.0 0.0 -1.0 -1.0 0.0 -1.0 0.0 1.0 ... 0.0 0.0 1.0 -1.0 1.0 0.0 1.0 1.0 1.0 1.0
2012-09-17 07:45:00+00:00 1.0 1.0 0.0 1.0 1.0 1.0 1.0 1.0 -1.0 0.0 ... 0.0 0.0 1.0 0.0 1.0 -1.0 0.0 0.0 0.0 -1.0

5 rows × 60 columns

Google Trends: Search volume of the given companies in the highest correlation window will be provided.A day was observed in plots in which there were considerable change,the day in question will be analyzed

In [224]:
google_trend=pd.read_csv("C:/Users/kteke/Downloads/multiTimeline (1).csv", header=1,index_col=0)
# Inspect data
print(google_trend.head())
google_trend.plot()
google_trend.corr()
            İş Yatırım: (Türkiye)  Albaraka Türk Katılım Bankası: (Türkiye)
Gün                                                                        
2013-09-09                     17                                        77
2013-09-10                     19                                        69
2013-09-11                     21                                        71
2013-09-12                     27                                        57
2013-09-13                     28                                        64
Out[224]:
İş Yatırım: (Türkiye) Albaraka Türk Katılım Bankası: (Türkiye)
İş Yatırım: (Türkiye) 1.000000 0.742652
Albaraka Türk Katılım Bankası: (Türkiye) 0.742652 1.000000

As it can be seen, there is a strong correlation between searches in the period where window correlation of the two stocks are highest.

In [225]:
google_trend_2=pd.read_csv("C:/Users/kteke/Downloads/multiTimeline (2).csv", header=1,index_col=0)
# Inspect data
print(google_trend_2.head())
google_trend_2.plot()
google_trend_2.corr()
            İş Yatırım: (Türkiye)  \
Gün                                 
2019-02-27                     64   
2019-02-28                     81   
2019-03-01                     45   
2019-03-02                     27   
2019-03-03                     23   

            TUKAŞ GIDA SANAYİ VE TİCARET A.Ş.: (Türkiye)  
Gün                                                       
2019-02-27                                             0  
2019-02-28                                            19  
2019-03-01                                             0  
2019-03-02                                             0  
2019-03-03                                             0  
Out[225]:
İş Yatırım: (Türkiye) TUKAŞ GIDA SANAYİ VE TİCARET A.Ş.: (Türkiye)
İş Yatırım: (Türkiye) 1.000000 0.021554
TUKAŞ GIDA SANAYİ VE TİCARET A.Ş.: (Türkiye) 0.021554 1.000000

Although there is strong correlation in this period for the given stocks,no strong correlation was observed in search volumes.

In [226]:
google_trend_3=pd.read_csv("C:/Users/kteke/Downloads/multiTimeline (3).csv", header=1,index_col=0)
# Inspect data
print(google_trend_3.head())
google_trend_3.plot()
google_trend_3.corr()
            Albaraka Türk Katılım Bankası: (Türkiye)  \
Gün                                                    
2013-09-09                                        77   
2013-09-10                                        69   
2013-09-11                                        70   
2013-09-12                                        61   
2013-09-13                                        68   

            TUKAŞ GIDA SANAYİ VE TİCARET A.Ş.: (Türkiye)  
Gün                                                       
2013-09-09                                             0  
2013-09-10                                             0  
2013-09-11                                             0  
2013-09-12                                             7  
2013-09-13                                             0  
Out[226]:
Albaraka Türk Katılım Bankası: (Türkiye) TUKAŞ GIDA SANAYİ VE TİCARET A.Ş.: (Türkiye)
Albaraka Türk Katılım Bankası: (Türkiye) 1.000000 -0.124751
TUKAŞ GIDA SANAYİ VE TİCARET A.Ş.: (Türkiye) -0.124751 1.000000

Although there is strong correlation in this period for the given stocks,no strong correlation was observed in search volumes.

Now we will investigate the changes in search volume of words "borsa","yatırım", and "para" in the vicinity of the most volatile day.

In [227]:
df_dif.idxmax()
Out[227]:
AEFES   2013-05-07 06:30:00+00:00
AKBNK   2013-05-07 06:30:00+00:00
AKSA    2013-05-07 06:30:00+00:00
AKSEN   2013-05-07 06:30:00+00:00
ALARK   2013-05-07 06:30:00+00:00
ALBRK   2019-02-13 07:00:00+00:00
ANACM   2013-05-07 06:30:00+00:00
ARCLK   2013-05-07 06:30:00+00:00
ASELS   2013-05-07 06:30:00+00:00
ASUZU   2013-05-07 06:30:00+00:00
AYGAZ   2013-05-07 06:30:00+00:00
BAGFS   2013-05-07 06:30:00+00:00
BANVT   2013-05-07 06:30:00+00:00
BRISA   2013-05-07 06:30:00+00:00
CCOLA   2013-05-07 06:30:00+00:00
CEMAS   2013-05-07 06:30:00+00:00
ECILC   2013-05-07 06:30:00+00:00
EREGL   2013-05-07 06:30:00+00:00
FROTO   2013-05-07 06:30:00+00:00
GARAN   2013-05-07 06:30:00+00:00
GOODY   2015-07-15 12:00:00+00:00
GUBRF   2013-12-05 15:30:00+00:00
HALKB   2013-05-07 06:30:00+00:00
ICBCT   2013-05-07 06:30:00+00:00
ISCTR   2013-05-07 06:30:00+00:00
ISDMR   2018-03-30 11:00:00+00:00
ISFIN   2013-05-07 06:30:00+00:00
ISYAT   2013-05-07 06:30:00+00:00
KAREL   2013-05-07 06:30:00+00:00
KARSN   2013-05-07 06:30:00+00:00
KCHOL   2013-05-07 06:30:00+00:00
KRDMB   2013-05-07 06:30:00+00:00
KRDMD   2013-05-07 06:30:00+00:00
MGROS   2013-05-07 06:30:00+00:00
OTKAR   2013-05-07 06:30:00+00:00
PARSN   2013-05-07 06:30:00+00:00
PETKM   2013-05-07 06:30:00+00:00
PGSUS   2013-05-07 06:30:00+00:00
PRKME   2013-05-07 06:30:00+00:00
SAHOL   2013-05-07 06:30:00+00:00
SASA    2018-06-25 06:45:00+00:00
SISE    2013-05-07 06:30:00+00:00
SKBNK   2013-05-07 06:30:00+00:00
SODA    2013-05-07 06:30:00+00:00
TCELL   2013-05-07 06:30:00+00:00
THYAO   2013-05-07 06:30:00+00:00
TKFEN   2013-05-07 06:30:00+00:00
TOASO   2013-05-07 06:30:00+00:00
TRKCM   2013-05-07 06:30:00+00:00
TSKB    2013-05-07 06:30:00+00:00
TTKOM   2013-05-07 06:30:00+00:00
TUKAS   2019-03-04 07:00:00+00:00
TUPRS   2013-05-07 06:30:00+00:00
USAK    2013-05-07 06:30:00+00:00
VAKBN   2013-05-07 06:30:00+00:00
VESTL   2013-05-07 06:30:00+00:00
YATAS   2017-11-20 06:45:00+00:00
YKBNK   2013-05-07 06:30:00+00:00
YUNSA   2013-05-07 06:30:00+00:00
ZOREN   2013-05-07 06:30:00+00:00
dtype: datetime64[ns, UTC]
In [228]:
google_trend_4=pd.read_csv("C:/Users/kteke/Downloads/multiTimeline (4).csv", header=1,index_col=0)
# Inspect data
google_trend_4.plot()
Out[228]:
<AxesSubplot:xlabel='Gün'>

Increase in the search volume of the provided words can be observed.This indicates these search terms may be related to significant changes in the stock market.

Final Remarks: Financial data posseses complex non linear relations and when detrended it is likely to follow a random walk.However with data mining tecniques valuable information that is kept hidden in the data may be recovered.The dataset contained lots of nan values filling them might have caused us to uncover non existing patterns or to miss some patterns that could have been discovered.